在这项工作中,我们解决了共同跟踪手对象姿势并从野外深度点云序列重建形状的具有挑战性,HandTrackNet,以估计框架间的手动运动。我们的HandTrackNet提出了一个新型的手姿势构成典型化模块,以简化跟踪任务,从而产生准确且稳健的手工关节跟踪。然后,我们的管道通过将预测的手关节转换为基于模板的参数手模型mano来重建全手。对于对象跟踪,我们设计了一个简单而有效的模块,该模块从第一帧估算对象SDF并执行基于优化的跟踪。最后,采用联合优化步骤执行联合手和物体推理,从而减轻了闭塞引起的歧义并进一步完善了手姿势。在训练过程中,整个管道仅看到纯粹的合成数据,这些数据与足够的变化并通过深度模拟合成,以易于概括。整个管道与概括差距有关,因此可以直接传输到真实的野外数据。我们在两个真实的手对象交互数据集上评估我们的方法,例如HO3D和DEXYCB,没有任何填充。我们的实验表明,所提出的方法显着优于先前基于深度的手和对象姿势估计和跟踪方法,以9 fps的帧速率运行。
translated by 谷歌翻译
在发展强化学习(RL)培训系统方面取得了重大进展。过去的作品,例如Impala,Apex,Seed RL,样本工厂等,旨在改善系统的整体吞吐量。在本文中,我们试图解决RL训练系统中的常见瓶颈,即平行环境执行,这通常是整个系统中最慢的部分,但很少受到关注。通过针对RL环境的策划设计,我们改善了不同硬件设置的RL环境模拟速度,从笔记本电脑和适度的工作站到NVIDIA DGX-A100等高端机器。在高端机器上,Envpool在Atari环境上的环境执行每秒可实现100万帧,在Mujoco环境上每秒执行300万帧。在笔记本电脑上运行时,Envpool的速度是Python子过程的2.8倍。此外,在开源社区中已经证明了与现有RL培训库的极大兼容性,包括Cleanrl,RL_Games,DeepMind Acme等。最后,Envpool允许研究人员以更快的速度迭代他们的想法,并具有巨大的潜力,并具有巨大的潜力事实上的RL环境执行引擎。示例运行表明,在笔记本电脑上训练Atari Pong和Mujoco Ant只需5分钟即可。 Envpool已经在https://github.com/sail-sg/envpool上开源。
translated by 谷歌翻译
在本文中,我们介绍了Tianshou,这是一个高度模块化的Python库,用于深钢筋学习(DRL),它使用Pytorch作为后端。天舒(Tianshou)打算通过提供DRL算法的灵活和可靠的基础架构来对研究进行研究。它通过统一界面通过20多种经典算法来支持在线和离线培训。为了促进相关的研究并证明天舒的可靠性,我们发布了田肖(Tianshou)的Mujoco环境基准,涵盖了八种具有最先进性能的经典算法。我们通过https://github.com/thu-ml/tianshou/开放源。
translated by 谷歌翻译
As natural language processing (NLP) for gender bias becomes a significant interdisciplinary topic, the prevalent data-driven techniques such as large-scale language models suffer from data inadequacy and biased corpus, especially for languages with insufficient resources such as Chinese. To this end, we propose a Chinese cOrpus foR Gender bIas Probing and Mitigation CORGI-PM, which contains 32.9k sentences with high-quality labels derived by following an annotation scheme specifically developed for gender bias in the Chinese context. Moreover, we address three challenges for automatic textual gender bias mitigation, which requires the models to detect, classify, and mitigate textual gender bias. We also conduct experiments with state-of-the-art language models to provide baselines. To our best knowledge, CORGI-PM is the first sentence-level Chinese corpus for gender bias probing and mitigation.
translated by 谷歌翻译
The task of reconstructing 3D human motion has wideranging applications. The gold standard Motion capture (MoCap) systems are accurate but inaccessible to the general public due to their cost, hardware and space constraints. In contrast, monocular human mesh recovery (HMR) methods are much more accessible than MoCap as they take single-view videos as inputs. Replacing the multi-view Mo- Cap systems with a monocular HMR method would break the current barriers to collecting accurate 3D motion thus making exciting applications like motion analysis and motiondriven animation accessible to the general public. However, performance of existing HMR methods degrade when the video contains challenging and dynamic motion that is not in existing MoCap datasets used for training. This reduces its appeal as dynamic motion is frequently the target in 3D motion recovery in the aforementioned applications. Our study aims to bridge the gap between monocular HMR and multi-view MoCap systems by leveraging information shared across multiple video instances of the same action. We introduce the Neural Motion (NeMo) field. It is optimized to represent the underlying 3D motions across a set of videos of the same action. Empirically, we show that NeMo can recover 3D motion in sports using videos from the Penn Action dataset, where NeMo outperforms existing HMR methods in terms of 2D keypoint detection. To further validate NeMo using 3D metrics, we collected a small MoCap dataset mimicking actions in Penn Action,and show that NeMo achieves better 3D reconstruction compared to various baselines.
translated by 谷歌翻译
Learning with noisy label (LNL) is a classic problem that has been extensively studied for image tasks, but much less for video in the literature. A straightforward migration from images to videos without considering the properties of videos, such as computational cost and redundant information, is not a sound choice. In this paper, we propose two new strategies for video analysis with noisy labels: 1) A lightweight channel selection method dubbed as Channel Truncation for feature-based label noise detection. This method selects the most discriminative channels to split clean and noisy instances in each category; 2) A novel contrastive strategy dubbed as Noise Contrastive Learning, which constructs the relationship between clean and noisy instances to regularize model training. Experiments on three well-known benchmark datasets for video classification show that our proposed tru{\bf N}cat{\bf E}-split-contr{\bf A}s{\bf T} (NEAT) significantly outperforms the existing baselines. By reducing the dimension to 10\% of it, our method achieves over 0.4 noise detection F1-score and 5\% classification accuracy improvement on Mini-Kinetics dataset under severe noise (symmetric-80\%). Thanks to Noise Contrastive Learning, the average classification accuracy improvement on Mini-Kinetics and Sth-Sth-V1 is over 1.6\%.
translated by 谷歌翻译
When a large language model (LLM) performs complex reasoning by chain of thought (CoT), it can be highly sensitive to individual mistakes. We have had to train verifiers to address this issue. As we all know, after human inferring a conclusion, they often check it by re-verifying it, which can avoid some mistakes. We propose a new method called self-verification that uses the conclusion of the CoT as a condition to build a new sample and asks the LLM to re-predict the original conditions which be masked. We calculate an explainable verification score based on the accuracy. This method can improve the accuracy of multiple arithmetics and logical reasoning datasets when using few-shot learning. we have demonstrated that LLMs can conduct explainable self-verification of their own conclusions and achieve competitive reasoning performance. Extensive experimentals have demonstrated that our method can help multiple large language models with self-verification can avoid interference from incorrect CoT. Code is available at \url{https://github.com/WENGSYX/Self-Verification}
translated by 谷歌翻译
Machine Learning (ML) approaches have been used to enhance the detection capabilities of Network Intrusion Detection Systems (NIDSs). Recent work has achieved near-perfect performance by following binary- and multi-class network anomaly detection tasks. Such systems depend on the availability of both (benign and malicious) network data classes during the training phase. However, attack data samples are often challenging to collect in most organisations due to security controls preventing the penetration of known malicious traffic to their networks. Therefore, this paper proposes a Deep One-Class (DOC) classifier for network intrusion detection by only training on benign network data samples. The novel one-class classification architecture consists of a histogram-based deep feed-forward classifier to extract useful network data features and use efficient outlier detection. The DOC classifier has been extensively evaluated using two benchmark NIDS datasets. The results demonstrate its superiority over current state-of-the-art one-class classifiers in terms of detection and false positive rates.
translated by 谷歌翻译
Forward-Looking Sonar (FLS) has started to gain attention in the field of near-bottom close-range underwater inspection because of its high resolution and high framerate features. Although Automatic Target Recognition (ATR) algorithms have been applied tentatively for object-searching tasks, human supervision is still indispensable, especially when involving critical areas. A clear FLS mosaic containing all suspicious information is in demand to help experts deal with tremendous perception data. However, previous work only considered that FLS is working in an ideal system configuration, which assumes an appropriate sonar imaging setup and the availability of accurate positioning data. Without those promises, the intra-frame and inter-frame artifacts will appear and degrade the quality of the final mosaic by making the information of interest invisible. In this paper, we propose a novel blending method for FLS mosaicing which can preserve interested information. A Long-Short Time Sliding Window (LST-SW) is designed to rectify the local statistics of raw sonar images. The statistics are then utilized to construct a Global Variance Map (GVM). The GVM helps to emphasize the useful information contained in images in the blending phase by classifying the informative and featureless pixels, thereby enhancing the quality of final mosaic. The method is verified using data collected in the real environment. The results show that our method can preserve more details in FLS mosaics for human inspection purposes in practice.
translated by 谷歌翻译
As the deep learning rapidly promote, the artificial texts created by generative models are commonly used in news and social media. However, such models can be abused to generate product reviews, fake news, and even fake political content. The paper proposes a solution for the Russian Artificial Text Detection in the Dialogue shared task 2022 (RuATD 2022) to distinguish which model within the list is used to generate this text. We introduce the DeBERTa pre-trained language model with multiple training strategies for this shared task. Extensive experiments conducted on the RuATD dataset validate the effectiveness of our proposed method. Moreover, our submission ranked second place in the evaluation phase for RuATD 2022 (Multi-Class).
translated by 谷歌翻译